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Abstract 

We resolve an open question from (Christiano, 2014b) posed in COLT’14 regarding the optimal dependency of 
the regret achievable for online local learning on the size of the label set. In this framework, the algorithm is shown a 
pair of items at each step, chosen from a set of n items. The learner then predicts a label for each item, from a label 
set of size L and receives a real valued payoff. This is a natural framework which captures many interesting scenarios 
such as online gambling and online max cut. (Christiano, 2014a) designed an efficient online learning algorithm for 
this problem achieving a regret of 0{'JnL^T), where T is the number of rounds. Information theoretically, one can 
achieve aregret of 0[\/n log LT). One of the main open questions left in this framework concerns closing the above 

gap- 

In this work, we provide a complete answer to the question above via two main results. We show, via a tighter 
analysis, that the semi-definite programming based algorithm of (Christiano, 2014a) in fact achieves a regret of 

o(V^rLr). 

Second, we show a matching computational lower bound. Namely, we show that a polynomial time algorithm 
for online local learning with lower regret would imply a polynomial time algorithm for the planted clique problem 
which is widely believed to be hard. We prove a similar hardness result under a related conjecture concerning planted 
dense subgraphs that we put forth. Unlike planted clique, the planted dense subgraph problem does not have any 
known quasi-polynomial time algorithms. 

Computational lower bounds for online learning are relatively rare, and we hope that the ideas developed in this 
work will lead to lower bounds for other online learning scenarios as well. 


1 Introduction 

Online learning is a classic area of machine learning starting from the seminal work of (Littlestone and Warmuth, 
1994), (DeSantis et al., 1988) and (Vavock, 1990). In this framework, also known as “prediction from expert advice”, 
the learning algorithm has to predict label information about an item or a set of items at each stage. It then earns a real 
valued payoff which is a function of the predicted labels. The aim is to achieve a total payoff in T rounds comparable 
to the best expert, i.e., the best fixed labeling of the items. The difference from the best possible payoff is known as 
the regret of the algorithm. 

The weighted majority algorithm (Littlestone and Warmuth, 1994) achieves the optimal regret of 0{y/T \ogN) for 
the above mentioned problem (T is the number of rounds, N is the total number of experts) but is computationally effi¬ 
cient only when the number of experts is small. In many scenarios, one is competing with a set of exponentially many 
experts. Hence, there has been a significant effort in designing polynomial time algorithms with optimal regret bounds 
for various such problems such as collaborative filtering, online gambling, and online max cut ((Kalai and Vempala, 
2005), (Hazan et al., 2012), (Kakade et al., 2009), (Kazan, 2009)) 
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A common aspect of many online learning scenarios mentioned above, is that at each time step, the learner is 
asked to predict local information about items. For instance, in the online max cut problem, the learner has to predict 
whether any two nodes are on the same side of the cut or on opposite sides. Recently, (Christiano, 2014a) proposed an 
elegant unifying framework called online local learning to capture such problems. 

In this framework, one is given a set of n items, numbered 1 to n. In each round t G [T], the learner gets a pair 
of items as input and has to reply with a pair of labels (oij, where the possible labels are in [L], Then, 

an adversary picks a payoff function P* : \LY [—1,1]. The goal is to compete with the htsl fixed labeling. More 
precisely, if we denote 

T 

OPT = max/g {l{it),l{jt)) 

t=i 

and the algorithm achieves expected payoff OPT — r, the algorithm has regret r, where the expectation is over the 
algorithm’s randomness. 

The main result of (Christiano, 2014a) is that the well known “Follow-the-regularized-leader” algorithm with an 
appropriate regularizer achieves regret 0{\fnTPT) for the online local learning problem. This, in particular, leads 
to optimal regret bounds' for the online max cut problem. Notice that as mentioned before, one can get the optimal 
regret of 0{\/n log L T) via an inefficient algorithm which runs the weighted majority algorithm over the space of 
all possible labelings. 

One of the main questions left open in this framework was to close the gap between the regret that can be achieved 
by an efficient algorithm and the information theoretically optimal regret. We close this gap by proving the following 
results (formal statements appear later). On the lower bound side, we prove: 

Theorem 1 (Informal). For every e > 0, if there exists an algorithm for online local learning achieving regret 
0{V and running in time polynomial in n, L, T, then in polynomial time, one can distinguish an instance 

of a random graph G(n, 1/2) from an instance of G(n, 1/2) with a randomly planted clique of size Here, 

/3(e) is a function such that lime_).o /3(e) = 0. 

We also prove a similar lower bound under a more robust conjecture concerning planting dense subgraphs which 
we introduce, which has no known quasipolynomial time algorithms, unlike planted clique. We show: 

Theorem 2 (Informal). For every > 0, if there exists an algorithm for online local learning achieving regret 
0{'\/and running in time polynomial in n, L, T, then in polynomial time, one can distinguish between 
an instance ofG{n,p) and an instance of G(n,p) with a randomly planted instance of G{k, q). Here, k, q depend on 
e, e', and /3(e, e') is a function such that limg^j/^o /3(e, e') — 0. 

We match the above lower bounds with the following theorem: 

Theorem 3 (Informal). For the online local learning problem, follow the regularized leader with an appropriate 
regularizer achieves regret 0{VnLT). 

Jointly these results are meaningful for multiple reasons. First and foremost, online local learning is the most 
natural generalization of constraint satisfaction problems (CSPs) to the online setting. The semidefinite relaxation 
upon which Theorem 3 is based is the same one considered in (Raghavendra, 2008), who proves that under the Unique 
Games Conjecture, it actually achieves the best approximation factor among all polynomial-time algorithms. Our 
result can be viewed as an extension of (Raghavendra, 2008): for the online version of CSPs, follow-the-regularized 
leader on the same semidefinite relaxation along with a log determinantal regularizer is the “optimal” algorithm, 
under widely believed conjectures. Furthermore, while our hardness reduction is specific to the setting of online local 
learning, given the paucity of lower bounds in the setting of online learning, our result is a significant contribution and 
we hope it will find applications in other settings as well. Finally, a labeling of the items with k labels can be also 
viewed as a fc-partitioning of the items. So, all the above results can be viewed through the lens of online settings for 
A:-partitioning. 

' Up to constant factors 
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1.1 Techniques 

We obtain the above mentioned upper bound on the regret by showing that “Follow-the-regularized-leader” using the 
same regularizer as (Christiano, 2014a) achieves the regret bound we are claiming, but with a completely different 
analysis. There, the idea is to express the entropy of a multivariate Gaussian in terms of the log-determinant of its 
covariance matrix, and that two multivariate Gaussian distributions that differ by a small amount in their covariance 
matrices cannot be too far in total variation distance as well. The main reason for this approach in (Christiano, 2014a) 
is that the Hessian of the log-determinantal regularizer is not diagonal, so it’s difficult to argue about its inverse. We 
use the special structure of the regularizer to get explicit expressions for the inverse, which allows us then to use more 
standard tools from convex geometry for analyzing “Follow-the-regularized-leader”. To do this we use some matrix 
calculus identities, which we think might be useful in other machine learning applications, where one needs to perform 
regularized optimization over polytopes of pseudo-moments. 

Our lower bounds are based on two conjectures about detecting planted dense structures inside random graphs. The 
first one is planted clique, which states that detecting planted cliques of sufficiently small size in an Erdos-Renyi graph 
cannot be done in polynomial time. We introduce a more robust version of this conjecture, planted dense subgraph, 
which concerns detecting planted dense Erdos-Renyi graphs inside sparser ones. While our reductions are similar in 
both cases, the state of the art algorithms for this detection problem are much worse. This is an indicator that this 
problem is likely harder and gives even stronger evidence for the hardness of achieving low regret. The proof idea is 
to use the online learner as an estimator of the size of the largest clique or dense subgraph in a graph, and the regret as 
the rate of error in this estimator. We show that if the rate is low enough, then one can distinguish between the planted 
and non-planted case. See next section for further details. 

1.2 The planted dense subgraph and planted clique problems 

We will review the planted clique conjecture and describe the dense subgraph conjecture, upon which we will be 
basing our lower bounds. 

1.2.1 Planted clique 

In the planted clique problem, one is given a graph sampled from one of two possible random ensembles: an Erdos- 
Renyi random graph G{n, 1/2), or an Erdos-Renyi random graph G{n, 1/2) along with a clique of size k placed 
between k randomly chosen vertices in the graph. (The usual notation for this random ensemble is G{n, 1/2, k).) 
The task is to distinguish whether one is presented with a graph from the G{n, 1/2) ensemble or the G{n, 1/2, k) 
ensemble. 

Previous sequences of work (Feldman et al., 2013), (Mekaet al., 2015), show that wide classes of natural poly¬ 
nomial time algorithms cannot efficiently distinguish between these two cases when the size of the planted clique 
is 712“*^, and it is conjectured that in fact there is no polynomial time algorithm for this task. More precisely, the 
conjecture is the following: 

Conjecture 1. Suppose that an algorithm A receives as input a graph G, which is either sampled from the ensemble 
G{n, 1/2) or G{n, 1/2, e = H(l). Then, no A which runs in polynomial time can decide, with probability 

which ensemble the input was sampled from. 

1.2.2 Planted dense subgraph 

The planted dense subgraph problem is a natural generalization of planted clique, where one again wants to distinguish 
between a random and a planted instance. In the planted case, we plant a denser graph inside a sparser one. Formally, 
let G{n,p, k, q) be a random graph ensemble generated in the following manner. First, one picks a random subset S 
of k vertices. Then, for all pairs of vertices inside S, one connects them with an edge independently with probability 
q. For all other pairs of vertices, we connect them independently with probability p. 

^The constant is arbitrary. One could make the conjecture for any constant bounded away from 4 
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The sizes and densities of the planted and ambient graph in which we will be interested are p — n~‘^,k = 
712 , q — for a < ^. The main reason this scenario is interesting is that unlike planted clique, we do not 

know of quasi-polynomial time algorithms for it. 

To be formal, we conjecture the following: 

Conjecture 2. Suppose that an algorithm A receives as input a graph G, which is either sampled from the ensemble 
G{n,p) or G{n,p, k, q), where k = n^~'^ for e' = fl(l) and k = p = n~°' for a = a < ^; q = k~°‘~’^ 

for e = and p = o{q). Then, no A which runs in polynomial time can decide with probability | which ensemble 

the input was sampled from. 

There are few ways to justify this conjecture. First, the current best known algorithm for this distinguishing prob¬ 
lem from (Bhaskara et al., 2010) runs in time '. This bound gives a running time of 2"^^ ' since k is polynomial 
in 77, which is significantly worse than quasi-polynomial. Second, it’s possible to show (Bhaskara et al., 2010) that 
spectral methods do not work in this regime. It’s also easy to check that simple algorithms like outputting the vertices 
with highest degree do not work either - since the variance of the degree in the sparser ambient graph dominates the 
degrees in the denser planted graph. Finally, similar conjectures to this have already been proposed in various contexts 
in theoretical computer science. ((Arora et al., 2010), (Applebaum et al., 2010)) 

The fact that state of the art algorithms have a much worse running time for this problem in comparison to planted 
clique is our motivation for putting forth this conjecture. Namely, our reduction of planted clique/planted dense 
subgraph to online local learning will produce an online learning instance in which the number of items n', the 
number of rounds T and the label set size L are all polynomial in the size of the input graph. Furthermore, the time 
to produce the inputs for the learning algorithm will be polynomial as well. Therefore, if = max(r, L, n'), and 
we have an algorithm of running time f{N) for online local learning, we get an algorithm for planted clique/planted 
dense subgraph of running time max (/ (poly (ri)), poly (n)). 

This means for instance, if our algorithm for online local learning has running time f{N) = our 

reduction would give an algorithm for planted clique with running time 77°(*°s A . A similar statement holds in the 
planted dense subgraph case. If our algorithm for online local learning has running time even f{N) = ^ , the 

reduction would give an algorithm better than the state of the art for planted dense subgraph. 

2 Computational lower bounds on achievable regret 

We will proceed with the lower bound first. The overall strategy will be as follows. We will produce an online learning 
instance from our input graph. In the planted case, there will be a fixed labeling which achieves a large payoff bp, and 
in the random case, we’ll show that any algorithm (efficient or not) can achieve at most some small payoff br- The 
reduction will ensure that if we can get a sufficiently low regret r in polynomial time, we will get a payoff of at least 
bp —r in the planted case, such that bp — r^ br, with probability |. Then to distinguish between planted and random, 
we simply declare planted if the payoff is large enough, and random otherwise. 

For both reductions, we will show a “robust” version of the bound first, e.g. for planted clique, we will show a 
lower bound of 0{sfr}GA~Af)T) if planted clique is hard when the size of the planted portion is for some 

function /3(e). Then we will take the limit e —0. The details of the reduction follow. 

2.1 Planted clique-based hardness 

Let us proceed to the planted clique-based lower bound first. We will show: 

Theorem 1. Let e = fl(l). If regret s/nL^T for /3 = (1 — uj ( - ) ) ( - | — 1 is achievable in time 

V ViognyyVi + e/ 

polynomial in n,L,T, then one can distinguish between G(n, 1/2) and G (ti, 1/2,77^/^“'^) with probability in 
polynomial time. 

^ Again, the choice of ^ is arbitary 
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Proof. We produce an instance for the online local learning problem, given an instance of the planted clique problem 
with size of the planted clique k in the following way. 

We randomly partition the input graph into n' = n/l clusters, each containing I vertices, where I = 10 ^. We 
associate each vertex with a unique label in We then use this as an instance for the online learning problem 

as follows. We run the online learning game for T = (^ ) steps. In each step t, we query a pair of clusters , C^J. 
Each pair is queried once, and the ordering is arbitrary. The algorithm responds with some labeling for the clusters 
(lij, /jj), and the payoff is 1 if the vertex for in Ci^ has an edge to the vertex for in Cj^. Otherwise, the payoff is 
0 . 

The distinguisher for the planted clique problem runs the online learning algorithm on the instance specified above 

4 

R = -pTT number of times. This is to ensure that with constant probability, the average payoff of the algorithm over the 
runs is close to the expected payoff. If the average payoff from the R runs is at least (1 + , the distinguisher 

replies with planted. Otherwise, it replies with random. 

Let’s assume the original graph was sampled from G{n, 1/2). Then, we claim that any algorithm (regardless if 
efficient or not) will get an average payoff of at most ^ + 5-^ with probability at least |. 

The above probability is with respect to the randomness in generating the graph from G(n, 1/2), the partitioning 
of the vertices, and any randomness in the algorithm. Let the pair of clusters queried at time step t be , Cjf). Let’s 
denote the random variable for the payoff in round t on the r-th repetition of the online learning problem as P)) . Let 

Qa,b be a random 0-1 indicator variable for whether there is an edge between vertices a, b. 

If Vitjt = Rit jt^ then the total payoff of the algorithm is V — 'Pkjf We claim that the variables 

are mutually independent. Indeed, this follows because the variables Ga,b, for any vertices a € Ci^, b G Cj^ are 
independent of the data shown to the online learner in the first t — 1 rounds and the algorithm’s randomness. 

But, by linearity of expectation, E [^'Pujt] = 5’ always is between 0 and 1. So, by Hoeffding’s 

inequality. 



In particular, with probability at least |, any algorithm gets average payoff of 1 + o{k^). 

Let’s proceed to the planted case. Lirst, we claim that with probability at least |, there is a fixed labeling with payoff 
at least . Let li be an indicator random variable for the event that no vertex from the planted clique belongs to 


cluster i. The partitioning is done independently of the graph, so Pr[Ii = 1] = ( 1 — 


< = e” 


/ Hence, 


if I is a random variable for the total number of clusters which contain no vertices from the planted clique, we know 
thatE[I] = ^E[Ii] < By Markov’s inequality, Pr I > < i. 


So, with probability at least |, the number of clusters with at least one vertex from the planted clique is at least 
= If- In this case, the labeling where we label each of the clusters with a vertex from the planted clique 
has a payoff of at least . In the online learning instance we constructed, the number of vertices is n', the 

number of rounds is T, and the label size is /. Let’s assume that we can achieve regret of VtTWt. According to the 
definition of regret, whenever the graph was a planted instance, and the partitioning resulted in a fixed labeling with 
payoff at least , the expected payoff of the algorithm (with respect to the randomness of the algorithm) is at 

least ( ~ Vtt/Wt. We claim that the average payoff over the R runs of the online learning algorithm will be 


close to this. 


If we denote by the payoff of the algorithm in the r-th repetition, then we have that E[P’'] > 

t=i 

( 2 .,») _ ^ 

n'l^T and all the variables are mutually independent and between 0 and n^. So, by Hoeffding’s 

1 . 2R‘^G 

bound, Pr — < E[P’’] — t < e~ . Setting t = lets us conclude that with probability 1 — o(l). 
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> — Vn'l^T — o{k^). Putting everything together, in the planted case, the average payoff is at 

least (^^ 2 ^^) — Vt^Wt — o{k‘^) with probability 

1 /A:/10^ 


Recall that we also proved that in a random instance, we get a payoff at most - I 

2 y 2 


- o(fc^) with probability at 


least I. We claim that 

0 

for large enough k, and 


^k/2b\ 


') ^ (' + Ts) > ,1 _ 


1 - 


100 


{2kl25)^ 


> 1 


1 


100/ 2 


1 ( fc / 10)2 




1 A/io\ 


looy 2 I 2 ) 


Hence, if y/v/WT = o{k'^), the distinguisher constructed outputs the correct answer with probability We will 
show exactly that. 

First we claim that 


Since I = 10$ = 10712+*^, after rearranging terms, 1 is equivalent to = o{n). 


Notice that n'^^iogn) = w(l), so for the above it is sufficient that (/3 + 1) 


= 1 — 0 ; 


/ 1 


\logn 


. But since 



— 1 the above is clearly satisfied. 





□ 


This quite easily will give the result that assuming Conjecture 1, achieving regret V nL^~^T, for any S = H(l) is 
hard. More precisely; 


Corollary 1. Let e = H(l). If we can achieve regret 'JnL^ in time polynomial in n, L, T, we can distinguishing 
between G{n,1/2) and G(n, 1/2, with probability | in polynomial time. In particular, if Conjecture 1 is 

true, no polynomial time algorithm can achieve regret s/nL^~^T, for any 5 = H(l). 

The proof of this Corollary is straightforward and relegated to Appendix A. We note that a stronger form of 
Conjecture 1 is consistent with our current knowledge of planted clique. In particular, we can strengthten the claim 
to allow any k = o{^/n), or alternatively k = n^~^, for any e = In this case. Corollary 1 will imply that 

achieving regret yjn o{L)T is impossible in polynomial time. 


2.2 Planted dense subgraph hardness 

We next move on to the planted dense subgraph based hardness. The proofs in this section are essentially a general¬ 
ization of the planted clique hardness, so are relegated to Appendix A. We formally show: 


Theorem 2. Let e, a, k satisfy the conditions of Conjecture 2. If regret sfnLflT for 


/3 = 2 - 


1 + e' 

2 ^ ^ 


- 1 


is achievable in time polynomial in n,L,T, then one can distinguish between G(n,Ps) and G{n,Ps,k,pd), where 


Ps =n °‘,k = n^ ^ ,pd = k 


with probability | in polynomial time. 
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And again as before, assuming Conjecture 2, achieving regret V nL^~^T, for any S = ri(l) is hard. More pre¬ 
cisely; 

Corollary 2. Let e',a,e = 12(1) and a > e. If we can achieve regret V in time polynomial in 
n,L,T, we can distinguish between G{n,ps) and G{n,ps,k,pii) in polynomial time with probability where 

Ps = n~'s^k = ,pd = k~'s~i. In particular, if Conjecture 2 is true, no polynomial time algorithm can 

achieve regret sjnL^~^T, for any S = 12(1). 

Similarly, a stronger form of Conjecture 2 is plausible given our current knowledge. We can allow a = w( j^^), 
e and k — o{^/n). (These constraints are necessary in order to make sure that n “ = o(l), and 

k~'^ = o(l), since unlike planted clique, we are thinking of p and q as asymptotic quantities, so we want to ensure 
that k~°‘~’^ = o{k~°‘), and = o(l).) In this case. Corollary 2 will imply that achieving regret ^Jn o{L)T is 
impossible in polynomial time. 

3 Improved regret bound analysis of log-determinantal regularizer 

We now move to the other result in our paper; matching the lower bound from the previous section. We show that 
“Follow-the-regularized-leader” with the log-determinant-based regularizer from (Christiano, 2014a) achieves regret 

o(V^)It). 

We will follow the (Hazan, 2009) framework for online convex optimization. The scenario is as follows; at each 
round t, the player chooses a point Xt G /C, where /C is some convex body. A linear payoff function is revealed, and 
the player receives a payoff Vt ■ Xt, for some vector Vf The goal is to compete with the “best decision in hindsight”, 
i.e. to maximize 


inf 


E 


Xj 


.2=1 


— max y Vi’ X 
x^K 

i—1 > 


where the expectation is over the randomness of the algorithm. 

Then, “Follow-the-regularized-leader”, with a convex regularizer TZ{x), is the following algorithm; 


Algorithm 1: Follow-the-regularized-leader 


1 = argmax-7^(£); 

2 for f ^ 1 to T do 

3 Predict xt ; 

4 Observe the payoff function Pt; 


Update xt+i = argmax-^^ WELiT^s-x-nix) 


The main theorem in (Hazan, 2009) is; 

Theorem. (Hazan, 2009) “Follow-the-regularized-leader”, with a convex regularize rTZ{x) and an appropriate choice 
of V, achieves regret 0(y/DyT), where 

Z 2 = max |7?.(a;)| , 7 = max 'Pj\V'^'R{x)\~^'Pt 

x£K.,Vt 

Since we are following the same approach as in (Christiano, 2014a), for us the polytope /C will be the convex 
polytope of pseudo-moments, i.e. positive semidehnite matrices M(j where 1 < i,j < n, 1 < a,b < L, such; 

• 1 > > 0 
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Then, Vt S [—1,, indexed by all pairs ((i, a), (j, h)). Furthermore, for any t, there are nonzeros in Vt only 
over a single pair (the edge that round is played on), and in that case a), {jt, b)) is the payoff of playing 

label a on the it vertex, and label b on the jt vertex. The payoff at round t would be simply 

^'Ptiiit.a), {jt, b)) ■ 

a^b 

The regularizer we use is TZ{M) = logdet(/ + LM). In (Christiano, 2014a), it is shown that the diameter 
parameter D is at most nL, however an additional L? factor in the analysis of the 7 parameter is lost. (While not 
quite written in these terms there, the argument in the paper can be very easily cast this way.) Here, we improve that 
analysis to show that in fact 7 < 4. 

So, we will simply prove: 

Theorem 3. For online local learning, “follow-the-regularized-leader” with a regularizer TZ{x) = log det(/ + LM) 
achieves regret 0{\/D^T), where 

D = max \'R{x)\ < nL , 7 = max 'Pt\^'^'R-{x)\~^'Pt < 4 
^6^ xeK,Vt 


3.1 Calculating the inverse Hessian of the regularizer 

We’ll prove the following lemma first: 

Lemma 1. IfTZ{M) = logdet(/ + LM), then: 

^ {5 {{i,a), {j, b))+L- M {{i,a), {j, b))) {6 {{i,a), + L- M {{i,a), (/,&'))) 


Proof. Let’s proceed stepwise. First, let’s calculate the gradient. For this, the following theorem from matrix calculus 
is very useful (where adj stands for the adjugate): 

Theorem. Jacobi’s Formula (Magnus andNeiidecker, 1995): 

= adj{B)lj = adj{B)j,i = det(B)B-/ 

With this in mind, the gradient is a simple matter of applying the chain rule. To keep the notation clean, let 
w = {i, a), X = {j, b), and calculate the gradient of R{M) with respect to M^^x- We get: 

dn{M) 1 ddet{I + L-M) d{lFL-M),x,x 

dM,x,x detil + L-M) dM,x,x ^ 

Again, to keep the notation lighter, let y = {i', a'),z = {j', b'). We will use a little bit of matrix calculus to show: 

Lemma 2. ^ ^ 
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Proof. Let’s denote by ^ the matrix with entries . Then, we claim the following is true: ^ = 

dX dY ^ 

+ X-^. This is not hard to check: it’s just due to the fact that in the matrix product XY, the entry 

at at 

{XY)ij is a sum of terms which multiplications of two entries in X and Y. An application of the chain rule 
gives the above quite easily. 


dB n-l 


dt 


Then, we use the following trick: BB ^ = /, so by the above observation, ^B 
= —B~^^B~^. Let’s apply this observation to B = {I + L ■ M) and t = My^ 


B 


dB- 


= 0. Hence, 


y,z 




dMy 




dMy^z 


P,Q 


dMy 


{I + L-MYl 


( 2 ) 

(3) 


Now, the term is non-zero only if p = y,q = z, in which case it is equal to L. Hence, we get: 


(3) = -L{I + L ■ M)-i (/ -f L ■ M)- 

as needed. 

With this in mind, the Hessian is obvious: 

d^n{M) d 


\-i 




dM, 


V,z 


L{I + L- M)fl =-L\I + L- M)f^y[I + L ■ M)- 


Let’s call the Hessian matrix fT(u,,a;),(y,z). We claim that the inverse H has the following explicit form: 

H(w,x),{y,z) = L + L ■ 

To show this, it’s just a matter of verifying that = 5{{w^ x), (y, z)). 

But this is easy enough: 

i^^)(w,x),{y,z) = H{w,x),{p,q)^(p,q),{v^^) 

p,q 

= + L ■ M)-i (/ + L ■ M)f\){-l/L\l + L ■ M)yJI + L ■ M)p,z) 

P,<} 

= '£{I + L- M)-i (/ + L ■ L ^ 

P Q 

= 5{x, z)S{w, y) = S{{w, x), {y, z)) 

This finishes the proof of Lemma 1 . 


□ 


□ 


3.2 Bounding 7 

Finally, we want to estimate 7 = max-g^ Vj[X^TZ{x)]~^'Pt, which will be relatively easy. Given the form of Vt, 
we can write this as where is the edge chosen at timestep t, 

a,b,c,d 

and Va.b is the payoff of playing label a on vertex it and label b on vertex jt- So, we want to bound 

a,6,c,(i 
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a,b,c,d 

However, since 'Pa,b,'Pc,d G [“Ij 1]. it suffices to upper bound 

J2(^ + ^ ' ^)iit,c),(jt,b)il + L ■ M)(^i^^a),(jt,d) = 

a,b,c,d 

^ ^ ' ^)(it,a),(jt,d) = -j-2 ( ^ ^ ' ^\it,e),{3tj) 

b,c a,d \ e,f 

Then we note the following; 

y^(/ + L ■ M)(it,e),(jtJ) ~ y]] ^{it,e),UtJ) + ^'y^,^(it,e),(jtj) 

ej e,f ej 

= L + L = 2L 


where we have used the marginalization property of M and the definition of the identity. 

4 Conclusion and open problems 

In this paper, we studied the optimal regret achievable in polynomial time for online local learning. We showed that 
follow the regularized leader with a log-determinantal regularizer achieves regret v/nLT, and we proved a matching 
lower bound based both on planted clique and planted dense subgraph. 

An interesting open problem is to investigate whether the regret bound can be improved when allowing sub¬ 
exponential time algorithms, since both planted clique and planted dense subgraph admit sub-exponential time algo¬ 
rithms. A natural approach is to maintain higher order pseudo-moments, following similar approaches when using the 
Lasserre/Sum of Squares hierarchies. The key difficulty is the right choice of the regularizer. The log determinant reg¬ 
ularizer is one particular approximation of the entropy of a distribution over the set of all possible labelings, matching 
the pseudo-moments that we maintain during the algorithm - it roughly corresponds to the entropy of a Gaussian with 
matching second moments. (Wainwright and Jordan, 2006) Even if we one has access to higher order moments, it is 
not clear if there is a better candidate than the log determinant. 

Another open problem is basing the hardness of achieving regret v/ nLT on more standard, worst case assumptions 
(e.g. NP-hardness, UGC-hardness). Indeed, it isn’t obvious that randomness is required for proving hardness, but it 
does seem to help. This mirrors the current state of affairs in improper learning, where the only known hardness results 
are either based on cryptographic assumptions or very recently, refuting random DNF formulas (Daniely et al., 2014). 
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A Relegated proofs 


Corollary 1. Let e = ri(l). If we can achieve regret \/nL^ '^T in time polynomial in n, L, T, we can distinguishing 
between G'(n,1/2) and G(n, 1/2, with probability | in polynomial time. In particular, if Conjecture 1 is 

true, no polynomial time algorithm can achieve regret sjnL^~^T, for any S = 11(1). 

Proof For ease of notation, let’s call e Since e = a;(T-^), directly applying Theorem 1, to distinguish 


between G(n, 1/2) and G{n, 1/2, v}/'^ '^), it’s sufficient to achieve regret s/nL^T, for /3 = (1 — e) 


2 _ o _ 4? 

1 + 2 ? ^ 1 + 2 £’ 


l + 2i 


: — 1. Since 


(1-e) 


l + 2i 


= (l-e) 2- 


4f 


l + 2i 


= 2 - 2 


1 + 2 ? 


2?2 


1 + 2 ? 


>2 — 6?=2 — e 


Hence, if we can achieve regret VnL^ ’^T, we can distinguish between G(n, 1/2) and 
G(n, 1/2, n 2 “f), as we needed. 


□ 


Theorem 2. Let e, a, k satisfy the conditions of Conjecture 2. If regret sfnJLfiT for 


/3 = 2- 


i + e' 


- 1 


is achievable in polynomial time, then one can distinguishbetween G{n,Ps) and G{n,ps,k,pci), whereps = n “,fc = 


712 ^ ^p^=k 


with probability ^ in polynomial time. 


Proof. We proceed in the same way as in the proof of Theorem 1. Namely, we will produce an instance for the online 
learning algorithm by partitioning our graph randomly into n' = j clusters, each of size j, where I — 10^. As 
before, we will query all T pairs of clusters, and the payoff will be 1 if there is an edge between the labels supplied by 

4 

the learner, and 0 otherwise. Finally, we run the distinguisher R = times, and we output planted if the average 

payoff from the R runs is at least ^ ■ Pd, and otherwise random. 

As before, we claim that in the case when the graph is G{n,pg), with probability at least |, any algorithm will 


achieve average payoff at most 




Ps 


We use the same notation as before: the pair of clusters queried at time step t is , Gj^), the random variable for 
the payoff in round t on the r-th repetition of the online learning problem is 7^)) and Qa,b is a random 0-1 indicator 
variable for whether there is an edge between vertices a, b. 

For the same reasons as before, the variables p mutually independent. Furthermore, 

T 




always is between 0 and 1. So, by Chernoff, Pr 


'Pt > T ■ Ps + 




< 


g- 100 / 3 ^ i.e. Pr 


>T ■ Ps + IQsjTps] 




VtVsJ 

In particular, with probability at least any algorithm 


gets payoff at most T ■ ps + 1Q\/T ■ ps. 


In the planted case, completely the same as in Theorem 1, with probability 1 — 5e ^'^>■ 54 , there will be at least 


II clusters which contain a vertex from the planted graph. 


Conditioned on the above event happening, we claim that any labeling that chooses the vertex from the planted 
graph in the clusters that contain one achieves a payoff of at least -pd —lOy^ ' Pd with probability at least 

y|. To show this, first notice that conditioned on belonging to two different clusters, the probability of an edge existing 
between two vertices in the planted graph is a Bernoulli 0 — 1 variable, which is 1 with probability pd. This is true since 
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the partitioning is done independently from the graph. But then, the payoff is at least • Pd - 10^ ■Pd 

with probability at least 1 — > | by Chernoff. 

Hence, in the planted case, again, with probability at least |, there is a fixed labeling with payoff at least ■ 

p, - ■ Pd- If the regret is Vn/WT, and such a labeling exists, using a Hoeffding bound as before, with 

probability at least 1 — o(l) the average payoff will be at least • Pd - • Pd — o{k‘^pd)- But since 

Ps = o{pd) and k'^Pd = w(l), if the regret is VtTWt, such that V n'l^T = o{k^ ■ pd), the distinguisher constructed 
outputs the correct answer with probability at least 
Since j = 0(fc), it’s sufficient to show: 


;(/3+l)/2 _ 

Plugging in I = 10^ = 10715+*^', 4 is equivalent )(5+'^) = 


As before, for this it’s sufficient that. 


(4) 





(a + e) 



It’s easy to check for our choice of /3 that this is satisfied, which finishes the proof. 


□ 


Corollary 2. Let e', a,e = H(l) and a > e. If we can achieve regret V nL^ 'P~~a~ej^ in polynomial time, we can 
distinguish between G{n,ps) and G{n,ps,k,pd) in polynomial time with probability where Ps = n~^,k = 

n^~T,Pd = In particular, if Conjecture 2 is true, no polynomial time algorithm can achieve regret 

VnL^~^T, for any S = H(l). 

Proof For notational ease, let ci = §, e = |, e' = 

First, notice thatps = o(pd)- Indeed, since ps = n““ andp^ = k~°‘~'^. 


Ps = o{pd) O n “ = o{n ^2 '^')(“+e)) 


However, since a > e. 


5 > y(a + e) = (i - e')(a + e) + e'(a + e) 

Since e', a, e = H(l), clearly this implies n““ = 

Since clearly e,e',a = directly applying Theorem 2, to distinguish between (^(njPs) and G(n, Ps, fc,Pd), 

where k = ,ps = n~°‘ andpd = k~°‘~'^, achieving regret V nL^T is sufficient, for 


/3 = 2 


\ - 2(4 + e')(a + s) _ _ 1 “ 4(4 + e')(a + e) 


4 + e' 
2 ^ ^ 


- 1 


= 1 - 


2e' + 4(4 + C){a + e) 


> 1 — 4e' — 8(y + e')(ci + e) > 1 — 4e' — 8a — 8e 


where the next to last inequality holds since e' > 0 and the last since e' < 4. 
So, if we can achieve regret 


we can distinguish between G{n,ps) and G{n,ps, k,pd), as we needed. 

□ 
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